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INTRODUCTION 

Teitelman' s Scheie ; 

Warren Teitelman presented a novel scheme for real time charac- 
ter recognition In his master's thesis submitted In June of 1963. A 
rectangle, in which the character if to be drawn* la divided into two parts, 
one shaded and the other unshaded. Using this division, a computer converts 
characters into ternary vectors in the following way. If the pen enters the 
shaded region, a 1 is added to the vector. When the unshaded region Is enter- 
ed, a is appended. Finally, if the pen is lifted from the writing surface, 
a w is generated. Figure 1 illustrates the basic idea he used. Thus, with 
the shading shown, the character V is converted to 1 w 1 0. A V drawn 
without lifting the pen would yield a 1 1. AT gives 1 w 1, and so on. 

Notice that each character may yield several vectors, depending on 
Just how it is drawn. The vectors to be stored* then, depend upon the style 
of the user aa well as the division of the rectangle into shaded and un- 
shaded regions. 

In order to conserve storage space and reduce search time, the charac- 
ter vectors of Teitelman' s scheme are stored in a tree- like structure like 
that shown in figure 2, Notice that the tree is essentially binary — only 




Figure 2. A vector storage tree. 



Since all figures arc completed by lifting the pen, it shall be the 
convention to drop the final w. 
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tfhen the pen is first applied, 
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Figure 1. Teitelrnan's Scherae, 



two branches can grow from a single node. This follow* since a w may 
be followed only by a 1 or 0; only by 1 or w; and 1 only by or w. 

Since several characters often end up at the sane position on the 
tree* Teitelman elected to resolve ambiguity by combining the information 
from several different region partitionlngs and their associated trees. 
The regions he used arc shown in figure 3. His program takes a weighted 
look at the characters suggested by each tree. 
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Figure 3. Tettclman's regions. 



Comment : 

Teltelman claims his scheme is superior to other character recognition 
methods because the program uses the order in which parts of the character 
are drawn. While this Is no doubt an important factor, I feel chat some- 
thing should also be said about the program's sensitivity to connectedness. 
By this I mean that the program notes not only the presence of lines, but 
also the general areas they connect. For example, Teltelman 1 s program 
would respond to figure &*a by indicating ''there is r a line connecting the 
upper left corner to the lower left corner," whereas to figure 4-b the 
response would be quite different. Recognition schemes based on template 



Figure 4. A connected and an unconnected sample. 
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matching might equate the two figures indicating "there are vertical lines 
in the upper left, middle left, and lower left corners." 

Finding Good Partitions : 

Suppose the input rectangle is divided into nine "atomic* 1 areas as 
in figure 5* If the part to be shaded is selected from combinations of 
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Figure 5. Atomic areas. 



9 * 

these atomic regions, one must suffer with 2 or 512 possibilities. 



The 



best combinations will obviously depend upon the character set to be recog- 
nized. But to construct and evaluate all possible partitions and assoc- 
iated trees would be exceedingly tedious. On the other hand, finding 
any sort of analytic method of optimum partitioning seems equally formida- 
ble, if not impossible. Hence the problem of finding a partition that is 

in some sense good seems best approached heuristically. 

only one 
Two simplifications were made here. First, I assumed-partitioning 

is available for the recognition process. And second, that only one version 

of each character is to be learned and recognised. The results are 

sufficiently enlightening that these assumptions seem justified. 



If duals formed by reversing shaded end unshaded regions are eliminated, 
there are still 256 possibilities. 



THE PBOGRAM 

Input ; 

The characters were converted to number strings by hand. First the 
character set to be recognized was printed into rectangles. Then a 
clear plastic template bearing the numbered atomic regions was placed on 
each character in turn. The course of the pen through the atomic 
regions can then be easily read off and put into the machine In the form 
of a number list. See figure 6. 




T converts to 2 9 6 vl2 3 

Figure 6. Conversion of characters to number sequences. 

Atomic Regions : 

Rather than build the shaded regions from the atomic regions of 
figure 5 as TeLtelman did, I chose those of figure 7. 




Figure 7. Atomic areas used, 



I think this set is swpwV-t better In pccittidatin . letters 
with dip -Mini strides ejch *s K, :i 3 ;. t 3, v, A % X, pad i. Tftfl 
fidvant^e is t;i*t xlmr Variants pre nit split i.;Ci different 
sequencer if input numbers, T -is 1* 3eqir**ble since it renders 
the assuointim if in y in* verslin oer chsrpcter airaewiet less 
unrealistic. X. fir example , yields the sequence 1 9 5 w 3 9 7 
with ay scheme, even if it* plsltlm and slant *r<* slightly al- 
tered. But usln<* Teiteli3nn f s Iryiut, the first strike *>lme 
yields four vnrlpnts es shivn In fl^ur* 8. Ciuplec with fiur 
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Figure 8. Variants if first strike if X* 



fir the secmd strike, £ wiuld h?ve 16 ^issiole sequences. 



Tree Criteria* 

As variius portltimlnss pre or^oised, there must be sime 
□ensure if hiw 313d the trees they generate are. After siae 
thiucht, twi qualities were selected as mist impirtant: first, 
the number if trenches shiuld be mull) *nd secmd, instances 
Of lire than me character #t a hide shiuld be rare* Subriutlnei 
were written ti examine tree structures fir these qualities and 
return with p oair if numbers. The first number wps just the 
number of brpnenes. The afCiiu* was the pnbability if errir, 
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given that all characters are equally likely and that when several 
character* are found at a node, one is selected at random. 

The branch mnnber, B, and the probability-of-error number* P* 
are combined into an overall badness factor, W, by the following formula: 
W = B + c P. This formula was arrived at by certain intuition-guided 
considerations designed to provide a reasonable balance between proba- 
bility of error and number of branches. Ideally there would be just as many 
branches as there were characters and no characters would be confused. 

3 was squared since the penalty for branches should be small until 
there are somewhat more branches than characters, but then the penalty 
should rapidly become severe. P was entered linearly since there seemed 
to be no good reason for another form. The weighting constant, c. , was 
selected so that the contribution from each term is the same when there are 
twice as many branches as characters and when the probability of error is 
1/10, both conditions seeming equally bad to me. 

W = P (c ? B) is an attractive alternative formula I have not explored. 

The Heuristics : 

The program Itself was straightforward. A flowchart is given in 
figure 9. Its goal was to take an existing partitioning and improve it 
by either adding or deleting a single atomic area. It was guided in this 
by four sLmple heuristics — two were specifically designed to reduce the 
number of branches, and two to reduce the probability of error. Which 
pair was triad first depended on the particular weaknesses of the existing 
tree. Recall that the inspection routine returns not with just an overall 
merit factorj but rather with a branch number and a probability-of-error num- 
ber. If the contribution from the branches to the badness factor is greater 
than that of the probability-of-error, then B - cP^O and the branch heuris- 
tics are triad first. Otherwise the probability heuristics are tried first. 

The first branch reduction heuristic is simply to remove from the 
shaded region an atomic area that has no common border with any other in 
the shaded region except possibly the center. Intuitively this should re- 
duce the length of many vectors as it does for the D vector in figure 10. 
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Figure 9. Prnxrraia flowchart. 
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Figure 10. Branch heuristic L. 

The second branch reduction heuristic adds an atomic area that 
chares both its non-center borders with areas already in the region, 
Wote how this reduces the length of the L vector in figure 11, for example 
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Figure LI. Branch heuristic 2. 



Since reducing the number of branches increases the Likiihood of 
finding more than one character at a node, as one might expect, the 
probability-of- error reducing heuristics complement the branch reducing 
heuristics. The first /probability heuristic involves adding a separated 
region; the second involves dropping an incerior region* See figures 
12 and 13. 
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Figure 12. 
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Figure 13. Probability heuristic 2. 



Note that If the first pair of heuristics tried fail, then 
the program tries adding or deleting the center region. Then, in des- 
peration, it enters the remaining pair of heuristics. If none of these 
work, ic reports that it has failed. 



RESULTS 

English Block Capitals : 

The program was used on the 26-letter English alphabet with en- 
couraging results. Starting with nothing in the shaded region, the 
program evolved the 2-3*4-6-6 region of figure 14, Only 63 branches 




Figure 14, Block capital region. 



were used of which 12 were w branches that cannot bear letters. The L 
is confused with 2, and V with V- The program looped 8 tines and tried 
17 of the possible 512 partitionlngs before finding the beat it could 
cone up with. The region changes are indicated below. 



first figure is B, number of branches _ 
second figure Is P, probability of error- 
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probability heuristic called + used 
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probability heuristic called + used 
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branch heuristic called + used 




(54 5/26) 



probability heuristics called and not us< 
center change did no good 
desperation branch heuristic used 
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(64 2/26) 



Greek Lower Case ; 

Results were even better using the Lower case Greek letters in 
the form soen in figure 15. The final tree had 52 branches with only 
> and I sharing a node. Again, the program was called 6 times and again 
the first heuristic tried was used in 6 or the 7 successful attempts 
at improvement. Only 12 configurations were tried before the winner was 
found. The winning tree is shown in figure 16. 
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figure 15. Greek Plphsbet. 
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?l=;ur« IS* "reek tree. 
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(58 3/24) 



branch heuristic called + used 
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probability heuristics called + not used 
center change did no good 
desperation branch heuristic used 
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failed 



A second experiment with the Greek letters was performed to see if 
the program would home in on the 2*^-5-6-8 region from another starting 
point. The 9-4-8 region was selected more or less at random as an alter* 
nate starting point, and the program successfully reached the same end 
state, this time in 6 steps. Sec figure 17. 
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Figure 17. Sequence of areas reported 
in second Greek alphabet experiment. 



A third experiment started from 1-9-5-7-2 but failed without 
arriving at the former solutions. See figure 18. 




Figure 18. Sequence of areas reported 
in third Greek alphabet experiment. 
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CONCLUSIOS + EXTENSIONS 

Co Delusions : 

The program as It stands seemed to work quite well. In general, 
the first heuristic tried succeeded in improving the badness factor 
by reducing the branch or probability factor it was designed for* 
Experiments with the Greek alphabet indicate that final results do not 
depend stongly on thestorting pattern « in two cases the same final pattern 
evolved* and in a third, the final pattern was different, but the badness 
factor was nearly the same. 

Further Work : 

It would be interesting to pump in more and more characters to see 
at what point saturation can be exhibited. That is, about how 
many characters arc required before the program is incapable of reducing 
the probability of error below some arbitrary figure, say 1/5* 

The badness factor, W * B + c.P* worked surprisingly well. The 
powers usedj the constant c.i and the general form were a bit arbitrary and 
might be Improved. 

Generalizing the problem to the case in which a character is assoc- 
iated with several input sequences would certainly require a more general 
function* 

If more than one tree is allowed for recognition, the problem becomes 
more difficult, W would have to apply to sets of trees, and heuristics 
would have to be found to operate on sets of partitioning*. 



